Model Selection

Large-scale Corpus Training

# Large-scale Corpus Training

Roberta Large Japanese

A large Japanese RoBERTa model pretrained on Japanese Wikipedia and the Japanese portion of CC-100, suitable for Japanese natural language processing tasks.

Large Language Model

Transformers Japanese

Opus Mt Tc Big Cat Oci Spa En

This is a neural machine translation model for translating from Catalan, Occitan, and Spanish to English, part of the OPUS-MT project.

Machine Translation

Transformers Supports Multiple Languages

Opus Mt Tc Big En Ar

This is a neural machine translation model for translating from English to Arabic, part of the OPUS-MT project, supporting multi-target language translation.

Machine Translation

Transformers Supports Multiple Languages

Icebert Xlmr Ic3

An Icelandic masked language model based on the RoBERTa-base architecture, fine-tuned from xlm-roberta-base

Large Language Model

Transformers Other

Icelandic masked language model trained on RoBERTa-base architecture using the fairseq framework

Large Language Model

Transformers Other

Portuguese BERT model fine-tuned for MLM (Masked Language Modeling) on 500,000 instances from the Brazilian Federal Official Gazette, based on the Bertimbau-Base model

Large Language Model

flavio-nakasato

GerPT2 is the large-scale version of the German GPT2, trained on the CC-100 corpus and German Wikipedia, excelling in German text generation tasks.

Large Language Model German

Bert Base Qarib60 1970k

QARiB is a BERT model based on Arabic and its dialects, trained on approximately 420 million tweets and 180 million text sentences, supporting various Arabic NLP tasks.

Large Language Model Arabic

Bert Base Qarib60 1790k

QARiB is an Arabic and dialect BERT model trained on approximately 420 million tweets and 180 million text sentences, supporting various downstream NLP tasks.

Large Language Model Arabic

T5 (Text-to-Text Transfer Transformer) base model pretrained on Indonesian mC4 dataset, requires fine-tuning for use

Large Language Model

Transformers Other

Rubert Base Cased Conversational

Russian conversational model trained on OpenSubtitles, Dirty, Pikabu, and Taiga corpus social media sections

Large Language Model Other

RoBERTa model trained on a 43GB dataset of Croatian and Serbian languages, supporting masked language modeling tasks.

Large Language Model

Transformers Other

Est-RoBERTa is a monolingual Estonian BERT-like model based on the RoBERTa architecture, trained on 2.51 billion Estonian vocabulary tokens.

Large Language Model

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase